Methods and apparatus for identifying related nodes in a directed graph having named arcs

ABSTRACT

The invention provides methods for identifying related data in a directed graph (e.g., an RDF data set). A “first” step—though the steps are not necessarily executed in sequential order—includes identifying (or marking) as related data expressly satisfying the criteria. A “second” step includes identifying as related ancestors of any data identified as related, e.g., in the first step, unless that ancestor conflicts with the criteria. A “third” step of the method is identifying descendents of any data identified, e.g., in the prior steps, unless that descendent conflicts with the criteria or has a certain relationship with the ancestor from which it descends. The methods generates, e.g., as output, an indication of each of the nodes identified as related in the three steps.

BACKGROUND OF THE INVENTION

[0001] The invention pertains to digital data processing and, moreparticularly, to methods and apparatus for identifying subsets ofrelated data in a data set. The invention has application, for example,in enterprise business visibility and insight using real-time reportingtools.

[0002] It is not uncommon for a single company to have several databasesystems—separate systems not interfaced—to track internal and externalplanning and transaction data. Such systems might have been developed atdifferent times throughout the history of the company and are thereforeof differing generations of computer technology. For example, amarketing database system tracking customers may be ten years old, whilean enterprise resource planning (ERP) system tracking inventory might betwo or three years old. Integration between these systems is difficultat best, consuming specialized programming skill and constantmaintenance expenses.

[0003] A major impediment to enterprise business visibility is theconsolidation of these disparate legacy databases with one another andwith newer databases. For instance, inventory on-hand data gleaned froma legacy ERP system may be difficult to combine with customer order datagleaned from web servers that support e-commerce (and other web-based)transactions. This is not to mention difficulties, for example, inconsolidating resource scheduling data from the ERP system with theforecasting data from the marketing database system.

[0004] Even where data from disparate databases can be consolidated,e.g., through data mining, directed queries, brute-force conversion andcombination, or otherwise, it may be difficult (if not impossible) touse. For example, the manager of a corporate marketing campaign may bewholly unable to identify relevant customers from a listing of tens,hundreds or even thousands of pages of consolidated corporate ERP,e-commerce, marketing and other data.

[0005] An object of this invention is to provide improved methods andapparatus for digital data processing and, more particularly, foridentifying subsets of related data in a data set.

[0006] A related object is to provide such methods and apparatus asfacilitate enterprise business visibility and insight.

[0007] A further object is to provide such methods and apparatus as canrapidly identify subsets of related data in a data set, e.g., inresponse to user directives or otherwise.

[0008] A further object of the invention is to provide such methods andapparatus as can be readily and inexpensively implemented.

SUMMARY OF THE INVENTION

[0009] The foregoing are among the objects attained by the inventionwhich provides, in one aspect, a method for identifying related data ina directed graph, such as an RDF data set. A “first” step—though thesteps are not necessarily executed in sequential order—includesidentifying (or marking) as related data expressly satisfying a criteria(e.g., specified by a user). A “second” step includes identifying asrelated ancestors of any data identified as related, e.g., in the firststep, unless that ancestor conflicts with the criteria. A “third” stepof the method is identifying descendents of any data identified, e.g.,in the prior steps, unless that descendent conflicts with the criteriaor has a certain relationship with the ancestor from which it descends.The methods generates, e.g., as output, an indication of each of thenodes identified as related in these steps.

[0010] By way of example, in the first step, a method according to thisaspect of the invention can identify nodes in the directed graph thatexplicitly match a criteria in the form field1=value1, where field1 is acharacteristic (or attribute) of one or more of the nodes and value1 isa value of the specific characteristic (or attribute). Of course,criteria are specific to the types of data in the data set and can bemore complex, including for example, Boolean expressions and operators,wildcards, and so forth. Thus, for example, a criteria of a data setcomposed of RDF triples might be of the form predicate=CTO andobject=Colin, which identifies, as related, triples having a predicate“CTO” and an object “Colin.”

[0011] By way of further example, in second step, the method “walks” upthe directed graph from each node identified as related in first step(or any of the steps) to find ancestor nodes. Each of these isidentified as related unless it conflicts with the criteria. To continuethe example, if the first step marks as related a first RDF triple thatmatches the criteria predicate=CTO and object=Colin, the second stepmarks as related a second, parent triple whose object is the subject ofthe first triple, unless that second (or parent) triple otherwiseconflicts with the criteria, e.g., has another object specifying thatDave is the CTO.

[0012] By way of further example, in the third step, the method walksdown the directed graph from each node identified in the previouslydescribed steps (or any of the steps) to find descendent nodes. Each ofthese is identified as related unless (i) it conflicts with the criteriaor (ii) its relationship with the ancestor from which walking occurs isof the same type as the relationship that ancestor has with a child, ifany, from which the ancestor was identified by operation of the secondstep. To continue the example, if the first step marks as related afirst RDF triple that matches the criteria predicate=CTO andobject=Colin and the second step marks as related a second, parenttriple whose object is the subject of the first triple via a predicaterelationship “Subsidiary,” the third step marks as related a third,descendent triple whose subject is the object of the second, parenttriple, unless that descendent triple conflicts with the criteria (e.g.,has a predicate-object pair specifying that Dave is the CTO) or unlessits relationship with the parent triple is also defined by a predicaterelationship of type “Subsidiary.”

[0013] As evident in the discussion above, according to some aspects ofthe invention, the data are defined by RDF triples and the nodes bysubjects (or resource-type objects) of those triples. In other aspects,the data and nodes are of other data types—including, for example, metadirected graph data (of the type defined in one of the aforementionedincorporated-by-reference applications) where a node represents aplurality of subjects each sharing a named relationship with a pluralityof objects represented by a node.

[0014] Still further aspects of the invention provide methods asdescribed above in which the so-called first, second and third steps areexecuted in parallel, e.g., as by an expert system rule-engine. In otheraspects, the steps are executed in series and/or iteratively.

[0015] In still further aspects of the invention, the invention providesmethods for identifying related data in a directed graph by exercisingonly the first and second aforementioned steps. Other aspects providesuch methods in which only the first and third such steps are exercised.

[0016] Still other aspects of the invention provide methods as describedabove in which the directed graph is made up of, at least in part, adata flow, e.g. of the type containing transactional or enterprise data.Related aspects provide such methods in which the steps are executed ona first portion of a directed graph and, then, separately on a secondportion of the directed graph, e.g., as where the second portionreflects updates to a data set represented by the first portion.

[0017] These and other aspects are evident in the drawings and in thedescription that follows.

BRIEF DESCRIPTION OF THE DRAWINGS

[0018] A more complete understanding of the invention may be attained byreference to the drawings, in which:

[0019]FIG. 1 is a block diagram of a system according to the inventionfor identifying related data in a data set;

[0020]FIG. 2 depicts a data set suitable for processing by a methods andapparatus according to the invention;

[0021] FIGS. 3-5 depict operation of the system of FIG. 1 on the dataset of FIG. 2 with different criteria.

DETAILED DESCRIPTION OF THE ILLUSTRATED EMBODIMENT

[0022]FIG. 1 depicts a system 8 according to the invention foridentifying and/or generating (collectively, “identifying”) a subset ofa directed graph, namely, that subset matching or related to a criteria.The embodiment (and, more generally, the invention) is suited for useinter alia in generating subsets of RDF data sets consolidated from oneor more data sources, e.g., in the manner described in the followingcopending, commonly assigned application, the teachings of which areincorporated herein by reference

[0023] U.S. patent application Ser. No. 09/917,264, filed Jul. 27, 2001,entitled “Methods and Apparatus for Enterprise Application Integration,”

[0024] U.S. patent application Ser. No. 10/051,619, filed Oct. 29, 2001,entitled “Methods And Apparatus For Real-time Business Visibility UsingPersistent Schema-less Data Storage,”

[0025] U.S. Patent Application Serial No. 60/332,219, filed Nov. 21,2001, entitled “Methods And Apparatus For Calculation And Reduction OfTime-series Metrics From Event Streams Or Legacy Databases In A SystemFor Real-time Business Visibility,” and

[0026] U.S. Patent Application Serial No. 60/332,053, filed Nov. 21,2001, entitled “Methods And Apparatus For Querying A Relational DatabaseOf RDF Triples In A System For Real-time Business Visibility.”

[0027] The embodiment (and, again, more generally, the invention) isalso suited inter alia for generating subsets of “meta” directed graphsof the type described in copending, commonly assigned application U.S.patent application Ser. No. 10/138,725, filed May 3, 2002, entitled“Methods And Apparatus for Visualizing Relationships Among Triples ofResource Description Framework (RDF) Data Sets,” the teachings of whichare incorporated herein by reference.

[0028] The illustrated system 8 includes a module 12 that executes a setof rules 18 with respect to a set of facts 16 representing criteria inorder to generate a subset 20 of a set of facts 10 representing an inputdata set, where that subset 20 represents those input data facts thatmatch the criteria or are related thereto. For simplicity, in thediscussion that follows the set of facts 16 representing criteria arereferred to as “criteria” or “criteria 16,” while the set of facts 10representing data are referred to as “data” or “data 10.” Theillustrated system 8 is implemented on a general- or special-purposedigital data processing system, e.g., a workstation, server, mainframeor other digital data processing system of the type conventionallyavailable in the marketplace, configured and operated in accord with theteachings herein. Though not shown in the drawing, the digital dataprocessing system can be coupled for communication with other suchdevices, e.g., via a network or otherwise, and can include input/outputdevices, such as a keyboard, pointing device, display, printer and thelike.

[0029] Illustrated module 12 is an executable program (compiled,interpreted or otherwise) embodying the rules 18 and operating in themanner described herein for identifying subsets of directed graphs. Inthe illustrated embodiment, module 12 is implemented in Jess (JavaExpert System Shell), a rule-based expert system shell, commerciallyavailable from Sandia National Laboratories. However it can beimplemented using any other “expert system” engine, if-then-elsenetwork, or other software, firmware and/or hardware environment(whether or not expert system-based) suitable for adaptation in accordwith the teachings hereof.

[0030] The module 12 embodies the rules 18 in a network representation14, e.g., an if-then-else network, or the like, native to the Jessenvironment. The network nodes are preferably executed so as to effectsubstantially parallel operation of the rules 18, though they can beexecuted so as to effect serial and/or iterative operation as well or inaddition. In other embodiments, the rules are represented in accord withthe specifics of the corresponding engine, if-then-else network, orother software, firmware and/or hardware environment on which theembodiment is implemented. These likewise preferably effect parallelexecution of the rules 18, though they may effect serial or iterativeexecution instead or in addition.

[0031] The data set 10 is a directed graph, e.g., a collection of nodesrepresenting data and directed arcs connecting nodes to one another. Asused herein, a node at the source of an arc is referred to as an“ancestor” (or “direct ancestor”), while the node at the target of thearc is referred to herein as a “descendent” (or “direct descendent”). Inthe illustrated embodiment, each arc has an associated type or name,e.g., in the manner of predicates of RDF triples—which, themselves,constitute and/or form directed graphs.

[0032] By way of example, in addition to RDF triples, the data set 10can comprise data structures representing a meta directed graph of thetype disclosed in copending, commonly assigned U.S. patent applicationSer. No. 10/138,725, filed May 3, 2002, entitled “Methods And Apparatusfor Visualizing Relationships Among Triples of Resource DescriptionFramework (RDF) Data Sets, e.g., at FIGS. 4A-6B and accompanying text,all of which incorporated herein by reference.

[0033] Alternatively or in addition, the data set 10 can comprise RDFtriples of the type conventionally known in the art and described, forexample, in Resource Description Framework (RDF) Model and SyntaxSpecification (Feb. 22, 1999). Briefly, RDF is a way of expressing theproperties of items of data. Those items are referred to as subjects orresources. Their properties are referred to as predicates. And, thevalues of those properties are referred to as objects. In RDF, anexpression of a property of an item is referred to as a triple, aconvenience reflecting that the expression contains three parts:subject, predicate and object. Subjects can be anything that isdescribed by an RDF expression. A predicate identifies a property of asubject. An object gives a “value” of a property. Objects can beliterals, i.e., strings that identify or name the corresponding property(predicate). They can also be resources.

[0034] The data set 10 may be stored on disk for input to module 12.Alternatively, or in addition, the data set may be a data flow, e.g., astream of data (real-time or otherwise) originating from e-commerce,point-of-sale or other transactions or sources (whether or not business-or enterprise-oriented). Moreover, the data set may comprise multipleparts, each operated on by module 12 at different times—for example, afirst part representing a database and a second part representingupdates to that database.

[0035] Criteria 16 contains expressions including, for example,literals, wildcards, Boolean operators and so forth, against which nodesin the data set are tested. In embodiments that operate on RDF datasets, the criteria can specify subject, predicate and/or object valuesor other attributes. In embodiments that operate on directed graphs ofother types other appropriate values and attributes may be specified.Criteria can be input by a user, e.g., from a user interface, e.g., onan ad hoc basis. Alternatively or in addition, they can be stored andre-used, such as where numerous data sets exist of which the samecriteria is applied. Further, the criteria 16 can be generated viadynamically, e.g., via other software (or hardware) applications.

[0036] Rules 18 define the tests for identifying data in the data set 20that match the criteria or that are related thereto. These are expressedin terms of the types and values of the data items as well as theirinterrelationships or connectedness.

[0037] Rules applicable to a data set comprised of RDF triples can beexpressed as follows: Rule No. Purpose Rule 0 (“Criteria Rule”) Matchcriteria to If triple's object is a literal, iden- triples in data settify triple as related if both triple's predicate and the object matchthose specified in the criteria. If triple's object is a resource,identify triple as related if triple's predicate matches that specifiedin criteria, if any, and if triples object matches that specified incriteria. 1 (“Sibling Rule”) Find as related Identify as related atriple that other triples at the shares the same subject (i.e., samelevel siblings), except those siblings that have the same predicate asthat specified in the criteria. 2 (“Ancestor Rule”) Walk up the dir-Identify as related a triple that ected graph to is a direct ancestor ofa triple find valid triples. identified by any of the other rules andthat is not in sub- stantial conflict with the criteria; For purposeshereof, a triple whose object is the subject of another triple is deemeda direct ancestor of that other triple; a triple whose subject is theobject of another triple is deemed a direct descendent of that othertriple. 3 (“Descendent Walk down the Identify as related a triple Rule”)directed graph to (hereinafter “identified descend- find valid triples.ent”) that is a direct descendent of a triple (hereinafter “iden- tifiedancestor”) identified as related by any of the other rules and whichidentified descendent (a) is not associated with the identified ancestorvia a predicate substantially matching a predicate named in thecriteria, if any, and (b) is not in substantial con- flict with thecriteria; (c) is not associated with the identified ancestor via apredicate matching a predicate by which the identified ancestor is as-sociated with a triple, if any, as a result of which the identifiedancestor was identified during execution of the Ancestor Rule.

[0038] As used above and throughout “substantial conflict” meansconflict that is direct or otherwise material in regard to determiningrelated data vis-a-vis the use for which the invention is employed(e.g., as determined by default in an embodiment and/or by selectionmade by a user thereof). By way of non-limiting example, for some uses(and/or embodiments) differences of any sort between the object of anRDF triple and that specified in a criteria are material, while forother uses (and/or embodiments) differences with respect to suffix, caseand/or tense are immaterial. Those skilled in the art will appreciatethat for other uses and/or embodiments, factors other than suffix, caseand/or tense may be used in determining materiality or lack thereof.

[0039] Rules applicable to other directed graphs (e.g., not comprised ofRDF triples) can be expressed as shown below. As noted above, theseother directed graphs can include the aforementioned meta directedgraphs, by way of non-limiting example. It will be appreciated that therules which follow are functionally equivalent to those expressed above.However, they take into that the data nodes in those other directedgraphs may have attributes in addition to those represented in theirconnectedness to other data nodes. To this end, the aforementionedSibling Rule is subsumed in those aspects of the rules that follow whichcall for testing each data node to determine whether they conflict withthe criteria. Rule No. Purpose Rule 0 (“Criteria Rule”) Match criteriato Identify as related data sub- data in data set stantially matching acriteria; 1 (Ancestor Rule) Walk up the dir- Identified as related datathat is a ected graph to direct ancestor of data identified find validdata in any of these rules, and that is not in substantial conflict withthe criteria; 2 (Descendent Rule) Walk down the Identify as related data(herein- directed graph to after “identified descendent”) find validdata that is a direct descendent of data (hereinafter “identified an-cestor”) identified as related in any of these rules, and whichidentified descendent: (a) Does not have a named rela- tionship with theidentified ancestor substantially matching a relationship named in thecri- teria, if any, and (b) Is not in substantial conflict with thecriteria; and (c) Does not have a named rela- tionship with theidentified ancestor matching a relationship the identified ancestor haswith a data, if any, as a result of which the identified ancestor wasidentified during execution of Rule 1.

[0040] Referring to back to FIG. 1, the related data 20 output orotherwise generated by module 12 represents those nodes or triplesidentified as “related” during exercise of the rules. The data 20 can beoutput in the same form as the input data or some alternate form, e.g.,pointers or other references to identified data within the data set 10.In some embodiments, it can be displayed via a user interface orprinted, or digitally communicated to further applications foradditional processing, e.g., via a network or the Internet. In onenon-limiting example, the related data 20 can be used to generatemailings or to trigger message events.

[0041] In operation, the module 12 is loaded with rules 18. In theillustrated embodiment, this is accomplished via compilation of sourcecode embodying those rules (expressed above in pseudo code) in thenative or appropriate language of the expert system engine or otherenvironment in which the module is implemented. See, step A. Of course,those skilled in the art will appreciate that, alternatively, rules insource code format can be retrieved at run time and interpreted insteadof compiled.

[0042] The criteria 16 is then supplied to the module 12. See, step B.These can be entered by an operator, e.g., via a keyboard or other inputdevice. Alternatively, or in addition, they can be retrieved from diskor input from another application (e.g., a messaging system) or device,e.g., via network, interprocess communication or otherwise.

[0043] The data set 10 is applied to the module 12 in step C. The dataset 10 can be as described above, to wit, a RDF data set or otherdirected graph stored in a data base or contained in a data stream, orotherwise. The data set can be applied to the module 12 via conventionaltechniques known in the art, e.g., retrieval from disk, communicationvia network, or via any other technique capable of communicating a dataset to a digital application.

[0044] In step D, the module 12 uses the rules 18 to apply the criteria16 to the data set 10. In the illustrated embodiment, by way ofnon-limiting example, this step is executed via the network 14configured (via the rules engine) in accord with the rules. In otherembodiments, this step is executed via the corresponding internalrepresentation of those rules.

[0045] Triples (in the case of RDF data sets) or data (in the case ofdata sets comprising other types of directed graphs) identified by themodule as “related”—meaning, in the context hereof, that those triplesmatch the criteria or are related thereto—are output as “identifieddata” in Step D. As described above, the output can be a list or othertabulation of identified data 20, or it can be a pointer or reference tothat data, for example, a reference to a location within the data set10.

[0046] In some embodiments, the output of identified data 20 can bestored for future use, e.g., for use with a mail-merge or otherapplications. In other embodiments, it can be digitally communicated toother data base systems or information repositories. Still further, insome embodiments, it can be added to a data base containing otherrelated data, or even replace portions of that data based.

[0047] The table below lists a directed graph—here, the triples of anRDF data set—of the type suitable for processing by module 12 toidentify data matching a criteria and related thereto. It will beappreciated that in practice, directed graphs processed by module 12 maycontain hundreds, thousands or more nodes, e.g., as would be typical foran RDF set representing transactional and enterprise-related data.Moreover, it will be appreciated that the directed graphs and/or triplesare typically expressed in a conventional data format (e.g., XML), orotherwise, for transfer to and from the module 12. Subject PredicateObject company://id#3 customer company://id#1 company://id#3 customercompany://id#4 company://id#3 customer company://id#2 company://id#1employee Howard company://id#1 employee Alan company://id#1 CTO Colincompany://id#2 employee David company://id#2 CTO Colin

[0048]FIG. 2 is a graphical depiction of this directed graph, i.e., RDFdata set. Per convention, subjects and resource-type objects aredepicted as oval-shaped nodes; literal-type objects are depicted asrectangular nodes; and predicates are depicted as arcs connecting thosenodes.

[0049]FIG. 3 depicts application by module 12 of criteria on the dataset shown in FIG. 2 using the above-detailed rules, specifically, thoseof the RDF type. The criteria is predicate=CTO and object=Colin. Thedepiction is simplified insofar as it shows execution of the rulesserially: in practice, a preferred module 12 implemented in a rulesengine (such as Jess) executes the rules in accord with the engine'sunderlying algorithm (e.g., a Rete algorithm as disclosed by Forgy,“Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match,”Problem Artificial Intelligence, 19(1982) 17-37, byhttp://herzberg.ca.sandia.gov/jess/docs/52/rete.html; or otherunderlying algorithm).

[0050] In a sequence of twelve frames, the depiction shows successiveidentification of triples as “related” (i.e., matching the criteria orrelated thereto) as each rule is applied or re-applied. The illustratedsequence proceeds from left-to-right then top-to-bottom, as indicated bythe dashed-line arrows. For sake of simplicity, the data set is depictedin abstract in each frame, i.e., by a small directed graph of identicalshape as that of FIG. 2, but without the labels. Triples identified asrelated are indicated in black.

[0051] Referring to the first frame of FIG. 3, the module 12 applies theCriteria Rule to the data set. Because the company://id#1—CTO—Colintriple matches the criteria (to repeat, predicate=CTO and object=Colin),it is identified as “related” and marked accordingly.

[0052] In the second frame, the module applies the Sibling Rule to findtriples at the same level as the one(s) previously identified by theCriteria Rule. In this instance, the company://id#1—employee—Howard andcompany://id#1—employee—Alan triples are identified and markedaccordingly.

[0053] In the third frame, the module applies the Ancestor Rule to walkup the directed graph to find ancestors of the triples previouslyidentified as related. In this instance, thecompany://id#3—customer—company://id#1 triple is identified and markedaccordingly.

[0054] In the fourth frame, the module applies the Descendent Rule towalk down the directed graph to find descendents of the triplespreviously identified as related. No triples are selected since bothcompany://id#3—customer—company://id#2 andcompany://id#3—customer—company://id#4 share the same predicate ascompany://id#3—customer—company://id#1. Referring back to the detailedrules, company://id#2, by way of example, is a direct descendent thathas a predicate (to wit, customer) connecting it with its identifieddirect ancestor (to wit, company://id#3) which matches a predicate thatancestor (to wit, company://id#3) has with a direct descendent (to wit,company://id#1) via which that direct ancestor (to wit, company://id#3)was identified during the execution of the Ancestor Rule.

[0055] In frames 5-8, the module 12 reapplies the rules, this timebeginning with a Criteria Rule match of company://id#2—CTO—Colin. Inframes 9-12, the module 12 finds no further matches upon reapplicationof the rules.

[0056]FIG. 4 parallels FIG. 3, showing however application by module 12of the criteria predicate=employee and object=Alan to the data set ofFIG. 2. Only eight frames are shown since module 12 finds no furthermatches during execution of the rules represented in the final fourframes.

[0057] Of note in FIG. 4 is frame two. Here, application of the SiblingRule by module 12 does not result in identification of all of thesiblings of company://id#1—employee—Alan (which had been identified asrelevant in the prior execution of the Criteria Rule). This is because,one of siblings company://id#1—employee—Howard has the same predicate asthat specified in the criteria. Accordingly, that triple is notidentified or marked as related.

[0058]FIG. 5 also parallels FIG. 3, showing however application bymodule 12 of the criteria resource=company://id#1 to the data set ofFIG. 2. Again, only eight frames are shown since module 12 finds nofurther matches during execution of the rules represented in the finalfour frames. Of note in FIG. 5 is the identifications effected byspecification of a resource as a criteria.

[0059] A further understanding of these examples may be attained byreference the Appendices A and B, filed herewith, which provide XML/RDFlistings of the data sets and criteria, and which also show rule-by-ruleidentification or (“validation”) of the triples.

[0060] Though the examples show application of the rules by module 12 toan RDF data set, it will be appreciated that alternate embodiments ofthe module can likewise apply the rules to data sets representing themeta directed graphs disclosed in copending, commonly assignedapplication U.S. patent application Ser. No. 10/138,725, filed May 3,2002, entitled “Methods And Apparatus for Visualizing RelationshipsAmong Triples of Resource Description Framework (RDF) Data Sets,” theteachings of which are incorporated herein by reference.

[0061] Described above are methods and apparatus meeting the desiredobjects. Those skilled in the art will, of course, appreciate that theseare merely examples and that other embodiments, incorporatingmodifications to those described herein fall within the scope of theinvention, of which

We claim:
 1. A method for identifying related data in a directed graph, comprising: A. executing the sub-steps of (i) identifying as related data substantially matching a criteria; (ii) identifying as related data that is a direct ancestor of data identified in any of sub-steps (i), (ii) and (iii), and that is not in substantial conflict with the criteria; (iii) identifying as related data (hereinafter “identified descendent”) that is a direct descendent of data (hereinafter “identified ancestor”) identified as related in any of sub-steps (i), (ii) and (iii), and which identified descendent (a) does not have a named relationship with the identified ancestor substantially matching a relationship named in the criteria, if any, and (b) is not in substantial conflict with the criteria; (c) does not have a named relationship with the identified ancestor matching a relationship the identified ancestor has with a data, if any, as a result of which the identified ancestor was identified during execution of sub-step (ii), B. generating an indication of data identified as related in step (A).
 2. The method of claim 1, wherein the criteria specifies a named relationship and a characteristic of that named relationship, and wherein sub-step (ii) includes comparing at least one of the relationship and the characteristic named in a criteria with any of attributes of the direct ancestor, and a relationship between the direct ancestor and any data that descends therefrom, in order to determine whether the director ancestor is in substantial conflict with the criteria.
 3. The method of claim 1, wherein the criteria specifies a named relationship and a characteristic of that named relationship, and wherein sub-step (iii) includes comparing at least one of the relationship and the characteristic named in a criteria with any of attributes of the identified descendent, and a relationship between the identified descendent and any data that descends therefrom, in order to determine whether the identified descendent ancestor is in substantial conflict with the criteria.
 4. The method of claim 1, comprising executing any of the sub-steps of step (A) any of serially, in parallel, or recursively.
 5. The method of claim 1, further comprising executing any of the sub-steps of step (A) using a rule-based engine.
 6. The method of claim 5, wherein the rule-based engine uses a Rete algorithm to effect execution of one or more of the sub-steps of step (A).
 7. The method of claim 1, wherein the directed graph comprises a data flow.
 8. The method of claim 7, wherein the data flow comprises any of transactional information and enterprise-related information.
 9. The method of claim 1, comprising executing step (A) with respect to a first data set representing a first portion of the directed graph, and executing step (A) separately with respect to a second data set representing a second portion of the directed graph.
 10. A method of claim 9, wherein the second data set comprises an update to the first data set.
 11. A method for identifying related data in a directed graph, comprising: A. executing the sub-steps of (i) identifying as related data substantially matching a criteria; (ii) identifying as related data that is a direct ancestor of data identified as related in any of sub-steps (i) and (ii), and that is not in substantial conflict with the criteria; B. generating an indication of data identified as related in step (A).
 12. The method of claim 11, wherein the criteria specifies a named relationship and a characteristic of that named relationship, and wherein sub-step (ii) includes comparing at least one of the relationship and the characteristic named in a criteria with any of attributes of the direct ancestor, and a relationship between the direct ancestor and any data that descends therefrom, in order to determine whether the director ancestor is in substantial conflict with the criteria.
 13. The method of claim 11, wherein the directed graph comprises a data flow.
 14. The method of claim 13, wherein the data flow comprises any of transactional information and enterprise-related information.
 15. A method for identifying related data in a directed graph, comprising: A. executing the sub-steps of (i) identifying as related data substantially matching a criteria; (ii) identifying as related data (hereinafter “identified descendent”) that is a direct descendent of data (hereinafter “identified ancestor”) identified in any of sub-steps (i) and (ii), and which identified descendent (a) does not have a named relationship with the identified ancestor substantially matching a relationship named in the criteria, if any, and (b) is not in substantial conflict with the criteria; (c) does not have a named relationship with the identified ancestor matching a relationship the identified ancestor has with a data, if any, as a result of which the identified ancestor was identified as related.
 16. The method of claim 15, wherein the criteria specifies a named relationship and a characteristic of that named relationship, and wherein sub-step (ii) includes comparing at least one of the relationship and the characteristic named in a criteria with any of attributes of the identified descendent, and a relationship between the identified descendent and any data that descends therefrom, in order to determine whether the identified descendent ancestor is in substantial conflict with the criteria.
 17. The method of claim 15, wherein the directed graph comprises a data flow.
 18. The method of claim 17, wherein the data flow comprises any of transactional information and enterprise-related information.
 19. The method of claim 15, comprising executing step (A) with respect to a first data set representing a first portion of the directed graph, and executing step (A) separately with respect to a second data set representing a second portion of the directed graph.
 20. A method of claim 19, wherein the second data set comprises an update to the first data set.
 21. A method for identifying related triples in a resource description framework (RDF) data set, comprising A. executing with respect to the data set the sub-steps of (i) identifying as related a triple substantially matching a criteria; (ii) identifying as related a triple that is a direct ancestor of a triple identified as related in any of sub-steps (i), (ii) and (iii), and that is not in substantial conflict with the criteria,  where, for purposes hereof, a triple whose object is the subject of another triple is deemed a direct ancestor of that other triple, and, conversely, where a triple whose subject is the object of another triples is deemed a direct descendent of that other triple; (iii) identifying as related a triple (hereinafter “identified descendent”) that is a direct descendent of triple (hereinafter “identified ancestor”) identified as related in any of sub-steps (i), (ii) and (iii), and which identified descendent (a) is not associated with the identified ancestor via a predicate substantially matching a predicate named in the criteria, if any, and (b) is not in substantial conflict with the criteria; (c) is not associated with the identified ancestor via a predicate matching a predicate by which the identified ancestor is associated with a triple, if any, as a result of which the identified ancestor was identified during execution of sub-step (ii), B. generating an indication of triples identified as related in step (A).
 22. The method of claim 21, comprising identifying as related a triple that is a sibling of a triple identified as related in sub-step (i) and that is not in substantial conflict with the criteria, where, for purposes hereof, triples that share a common subject are deemed siblings.
 23. The method of claim 21, wherein the criteria specifies a predicate and an object associated with that predicate, and wherein sub-step (ii) includes comparing at least one of the predicate and object specified in the criteria with direct ancestor in order to determine whether the director ancestor is in substantial conflict with the criteria.
 24. The method of claim 21, wherein the criteria specifies a predicate and an object associated with that predicate, and wherein sub-step (iii) includes comparing at least one of the predicate and object specified in the criteria with the identified descendent in order to determine whether the identified descendent ancestor is in substantial conflict with the criteria.
 25. The method of claim 21, comprising executing any of the sub-steps of step (A) any of serially, in parallel, or recursively.
 26. The method of claim 21, further comprising executing any of the sub-steps of step (A) using a rule-based engine.
 27. The method of claim 26, wherein the rule-based engine uses a Rete algorithm to effect execution of one or more of the sub-steps of step (A).
 28. The method of claim 21, wherein the data set comprises a data flow.
 29. The method of claim 28, wherein the data flow comprises any of transactional information and enterprise-related information.
 30. The method of claim 21, comprising executing step (A) with respect to a first data set of RDF triples, executing step (A) separately with respect to a second, related data set of RDF triples.
 31. A method of claim 30, wherein the second data set comprises an update to the first data set.
 32. A method for identifying related triples in a resource description framework (RDF) data set, comprising A. executing with respect to the data set the sub-steps of (i) identifying as related data substantially matching a criteria; (ii) identifying as related a triple that is a direct ancestor of a triple identified in any of sub-steps (i) and (ii), and that is not in substantial conflict with the criteria,  where, for purposes hereof, a triple whose object is the subject of another triple is deemed a direct ancestor of that other triple; a triple whose subject is the object of another triples is deemed a direct descendent of that other triple; B. generating an indication of data identified as related in step (A).
 33. The method of claim 32, wherein the criteria specifies a predicate and an object associated with that predicate, and wherein sub-step (ii) includes comparing at least one of the predicate and object specified in the criteria with direct ancestor in order to determine whether the director ancestor is in substantial conflict with the criteria.
 34. The method of claim 33, wherein the data set comprises a data flow.
 35. The method of claim 34, wherein the data flow comprises any of transactional information and enterprise-related information.
 36. A method for identifying related triples in a resource description framework (RDF) data set, comprising A. executing with respect to the data set the sub-steps of (i) identifying as related data substantially matching a criteria; (ii) identifying as related data (hereinafter “identified descendent”) that is a direct descendent of data (hereinafter “identified ancestor”) identified as related in any of sub-steps (i) and (ii), and which identified descendent (a) is not associated with the identified ancestor via a predicate substantially matching a predicate named in the criteria, if any, and (b) is not in substantial conflict with the criteria; (c) is not associated with the identified ancestor via a predicate matching a predicate by which the identified ancestor is associated with a triple, if any, as a result of which the identified ancestor was identified as related, B. generating an indication of data identified as related in step (A).
 37. The method of claim 36, wherein the criteria specifies a predicate and an object associated with that predicate, and wherein sub-step (iii) includes comparing at least one of the predicate and object specified in the criteria with the identified descendent in order to determine whether the identified descendent ancestor is in substantial conflict with the criteria.
 38. The method of claim 36, wherein the data set comprises a data flow.
 39. The method of claim 38, wherein the data flow comprises any of transactional information and enterprise-related information.
 40. The method of claim 21, comprising executing step (A) with respect to a first data set of RDF triples, executing step (A) separately with respect to a second, related data set of RDF triples.
 41. A method of claim 40, wherein the second data set comprises an update to the first data set.
 42. A method for identifying related data in a directed graph, comprising: A. executing the sub-steps of (i) identifying as related data that is a direct ancestor of data identified in any of sub-steps (i) and (ii), and that is not in substantial conflict with the criteria; (ii) identifying as related data (hereinafter “identified descendent”) that is a direct descendent of data (hereinafter “identified ancestor”) identified as related in any of sub-steps (i) and (ii) and which identified descendent (a) does not have a named relationship with the identified ancestor substantially matching a relationship named in the criteria, if any, and (b) is not in substantial conflict with the criteria; (c) does not have a named relationship with the identified ancestor matching a relationship the identified ancestor has with a data, if any, as a result of which the identified ancestor was identified during execution of sub-step (ii), B. generating an indication of data identified as related in step (A). 